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ABSTRACT 

This paper describes a language-f or-specif ic-purposes 
test development project designed to assess both general language 
proficiency and classroom communicative competence for the purpose of 
accrediting teachers of Italian as a second/foreign language. A 
rationale for test design is presented that draws in a review of the 
second language acquisition literature on teacher input and an 
analysis of teacher language behavior in foreign language classrooms 
where Italian is both medium and object of instruction. The nature of 
test tasks on the speaking component of the test is outlined. 
Eighteen language teacher experts were surveyed after they had viewed 
and rated 50 videotaped samples of trial test performance. This 
feedback provided overall support for the teacher-specific 
orientation of the test, but some informants expressed concern about 
the authenticity of particular tasks with respect to real world 
communication and about the validity of assessing the "teacherliness" 
of performance in the test environment. (Contains 36 references.) 
(AA) 
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ASSESSING THE LANGUAGE PROFICIENCY OF SECOND 
LANGUAGE TEACHERS: AN LSP APPROACH TO TEST DESIGN 

Paper to be presented at RELC Regional Seminar on 
t Language for Specific Purposes: April 1993, 
Catherine Elder, 

NLLIA LANGUAGE TESTING CENTRE, University of Melbourne 
A b str a ct 

The paper describes an LSP test development project designed to assess both general language 
proficiency and classroom communicative competence for the purpose of accrediting teachers of 
^ Italian as a second/foreign language. 

Drawing on a review of the SLA literature on teacher input and an analysis of teacher language 
so behaviour in foreign language classrooms where Italian is both medium and object of instruction, 

^ a rationale for test design is presented and the nature of test tasks on the speaking component of 

Q the test is outlined. 

w 

To determine the test's acceptability, feedback was sought from language teacher experts after they 
had viewed and rated 50 videotaped samples of trial test performance. While this feedback provided 
overall support for the teacher-specific orientation of the test, some informants expressed concerns 
about the authenticity of particular tasks with respect 'real world' classroom communication and 
about the validity of assessing the 'teacherliness' of performance in the test environment. 

The paper offers a practical solution to these concerns but stresses the need for further research to 
validate the test's claim to measure specific purpose competence. 



1. THE TEST DEVELOPMENT PROTECT 

The project described in this paper is currently being undertaken at the NLLIA Language Testing 
Centre at the University of Melbourne. The test which is being developed has two prime purposes: 

1) selection of candidates for L2 teacher education courses and diagnosis of training needs 
By making explicit the occupational language requirements of the foreign language teacher, this 
test serves to identify individual strengths and weaknesses which will assist teacher educators in 
selecting amongst applicants for L2 method training and in setting goals for language instruction. 

Zlcertifying foreign language teachers as employable in state primary schools. 
In some pans of Australia L2 teacher certification is mandatory for trained generalist teachers who 
lack the requisite foreign language qualifications (i.e. a post- Year 12 major sequence of study in 
the relevant language) including those who have studied outside Australia and/or native speakers 
whose language skills have been acquired "at the mother's knee" rather than through a process of 
formal study. 

The choice of Italian was determined by the fact that in Australia this is the language most 
commonly taught at primary level in both state and independent schools. The test format presented 
here is intended to serve as a blueprint for similar tests in other languages. The test, with some 
modifications to the items, is arguably also relevant to secondary L2 teachers and will in fact be 
mailed on representatives of this population. 

r4 2. DEFINING THE DOMAIN 

bo The first step in the LSP test development process is that of domain definition. Since the process 
^ of observing and describing the complex behaviours and underlying skills which constitute the 
l^j professional role is both unwieldy and resource intensive, we turned to the SLA literature in order 
Q) to 
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i) identify key features of 'teacher talk' which appear to contribute positively to classroom-based 
second language learning and 

ii) establish a framework for our subsequent job analysis. 

2.1 key aspects of 'teacher talk 1 in second/foreign language classrooms 

Four aspects of language or language-related ability which can be regarded as central to the teacher 

role are listed below: 

(i) {hp abilit y to use the target language as medium and object of instruction 

The monolingual principle of using the target languageas both medium and object of classroom 
instruction is supported by such writers as Wilkins (1974), Dulay, Bun & Krashen (1982), Swain 
(1982) and Ellis (1984). Arguments are based on two premises: that amQunts of exposure to the 
target language are a key factor in determining levels of learner attainment and that quality of 
language input is the issue. Ellis (1984) for example proposes that "framework-oriented" 
interaction, which centres around classroom procedures, has the advantage of being more 'natural 1 
than the artificial language imported into the classroom for pedagogical purposes. 

The quality of language input is also evaluated in terms of the opportunities it provides for learner 
output. Swain (1985) contends that progress in classroom second language learning is directly 
related to amounts of learner production and Ellis (1988) claims that certain types of teacher 
communicaton are more conducive to such production than others. He proposes that "message- 
oriented" talk (i.e. interaction focussing on the teaching of subject content that is part of the school 
curriculum) will be more likely to stimulate meaningful output from second language learners than 
"medium oriented" talk (i.e. interaction aimed at teaching the target language). Likewise "activity- 
oriented" interaction (aimed at achieving student behaviours resulting in some non-verbal product) 
offers opportunities for a wide range of learner-initiated speech acts. 

On these grounds it seems reasonable to suggest that, while the second language teacher may 
resort to the LI in certain situations, the more classroom functions the teacher is abje to perform in 
the target language the better. Some of these classroom functions will make very particular 
demands on language proficiency. For example, the use of the target language for procedural 
purposes will involve the teacher in issuing quite complex sets of directives and the teaching of 
curriculum content through the target language may require control of subject-specific discourse 
(e.g. the language required to explain mathematical or scientific processes). 

(ii) the ability to modify target language input in such a way as to render it comprehensible to 
learners 

There is evidence from a number of SLA studies that particular features of teacher-student talk 
differ from speech adressed to linguistically competent adults. Larsen-Freeman & Long (1991) 
characterize this talk as a 'regularized' version of the language whereby forms which constitute 
exceptions to general rules are avoided. Thus teacher speech usually contains a more restricted 
range of vocabulary (Arthur et al. 1980), greater prevalence of high frequency lexical items 
(Chaudron 1983,1987; Zobl 1983), a lower incidence of idiomatic usage (Henzl 1973,1975,1979) 
and a tendency towards shorter, syntactically simplified or propositionally less complex 
utterances(Gaies 1977, Scarcella and Higa 1981). Hatch (1983) also mentions high incidence of 
directives, frequent pauses, reduced rate of speech and clear articulation as being characteristic of 
teacher inputin second language classrooms. These speech adjustments, which closely ressemble 
those found in 'foreigner talk' generally, are claimed to be facilitative of learner comprehension 
and are considered by Krashen (1980,1981,1982) to be a sine qua non of learner intake. More 
recent evidence (Long & Larsen-Freeman 1991) suggests that modification of the interactional 
structure of conversation, (eg repetitions, topic fronting, paraphrase, decompositions, rhetorical 
signalling) by teachers in response to perceived learner needs may be even more crucial for second 
language acquisition, than purely linguistic adjustments. 

Attention to speed and clarity of articulation, to choice of lexis and to syntactical complexity, as 
well as a high degree of linguistic flexibilty to enable reformulation, simplification or elaboration 
of discourse are thus of key importance for the foreign/second language teacher. 
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(iiiYThe ability to produce well-formed input for learners 

A number of studies suggest a relationship between frequency of grammatical features occuring in 
teacher input and accuracy levels in subsequent learner production (Hatch 1974, Lightbown 1980, 
Larsen Freemen 1976, Long 1981). Pidgeonized production from second language learners, on 
the other hand, has been attributed to interaction with semi-proficient peers at the expense of 
exposure to native-speaker-like input from the teacher (Plann 1977, Hammerly, 1987, Harley & 
Swain 1978). Cathcart Strong (1986) points out that in foreign language classrooms teacher 
reformulations and expansions of learner interlanguage may be the only source of well-formed 
input available to learners and may thus be central to the acquisition process. 

While the evidence form these studies is only tentative, it suggests that the teacher's role in 
modelling correct forms in the target language may be an important one. Correctness, moreover, is 
not restricted to grammatical accuracy. One aspect of 'well-formedness' which receives less 
attention in the literature is pronunciation. In the absence of other models of the target language, it 
is likely that the quality of the teacher's pronunciation will have a bearing on the intelligibility of 
learner speech (Suter 1976). 

(iv) the ability to draw learners' attention to the formal features of the target language 
There is still uncertainty about the ways in which formal features of the target lpnguage are first 
rendered salient to learners and subsequently incorporated within their interlanguage system, and 
there are relatively few studies which explore the impact of corrections on learner output 
(Chaudron, 1988). While it is generally agreed that positive grammatical evidence to learners is 
best provided in the form of naturally occurring samples of grammatical language, there are 
indications in the literature that explicit 'negative input' (ie signalling that the learners output 
deviates from native-speaker norms) can have a positive effect on the accuracy of learner 
production (e.g. Schachter 1983, Tomasello and Herron 1989). Schachter (1986) points out that 
in foreign language classrooms the teacher may the only person equipped to provide this kind of 
feedback. The provision of such feedback has implications not only for teacher training in 
classroom methodology, but also for teacher language proficiency. To talk about a foreign 
language in the foreign language, the teacher will require knowledge of metalinguistic terminology 
and command of particular language functions required to draw attention to rule violations and to 
provide explanations geared to the learner's level of understanding. 

2.2 Job analysis 

Four Italian programmes (3 primary and 1 junior secondary) were chosen as sites for observing 
foreign language teachers in action. The programmes were chosen to represent a range of 
approaches (partial-immersion, activity-based, grammar-based, thematic) and a range of grade 
levels. Each programme was well-established and involved experienced teachers who had been 
recommended for their high level of professionalism and for the fact that they conducted their 
lessons in the target language. The decision to observe and consult 'good' rather than randomly 
selected teachers was based on informal observations conducted by the researcher which revealed 
that the use Italian as medium of classroom instruction or even as the prime vehicle of classroom 
communication was not the norm and that both the quantity and quality of Italian input provided 
for learners in second language classrooms was not, if we believe what the literature has to tell us, 
likely to promote second language development It was felt that if the test were to be used as 
benchmark for professional training, it should offer a model of 'best practice' rather than reflect 
the often questionable validity of the status quo. 

On the basis of observations one or two lessons at each site and subsequent discussions with the 5 
participating teachers, an inventory was compiled listing functions performed by the teacher in the 
target language (see Appendix A). These behaviours have been grouped into categories based 
loosely on a goal-based framework of classroom interaction developed by Ellis (1984). which was 
referred to in the literature review above. The framework, which has been adapted, divides 
classroom talk into 
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1. interaction oriented toward 'core* pedagogic goals 

including: medium- oriented interaction 1 
message-oriented interaction 
activity-oriented interaction; 

2. interaction which serves to create a framework within which teaching can take place; 

3. extra-classroom use of the target language 2 

for L2 lesson preparation 

communication with school community members 
professional development 

The inventory makes no claims to be rigorous in its groupings (Ellis himself acknowledges the 
inevitable overlap between one category and another due to the fact that many language functions 
performed by the teacher are multipurposive) nor is it by any means exhaustive. A much more 
extensive study would need to be undertaken to cover all the possible interactions performed by 
second language teachers and to establish the microskills involved in each one. It can at best be 
regarded as partially indicative of the range of target language functions which the competent 
second language teacher in primary and junior secondary classrooms may be called upon to 
perform. What emerges most powerfully from this small survey (and this was supported by 
classroom observations) is the centrality of oral skills in teacher performance. The ability to read 
and write in Italian was nevertheless acknowledged by all informants as being of key importance 
their role, particularly in the lesson preparation phase. 

3. SAMPLING FROM THE DOMAIN 

Since this taxonomy is descriptive rather than heirarchical it is, as it stands, of limited use to the 
test developer. Clearly not all tasks can be included on the test and many of them are impossible to 
model in the test situation in the absence of the learner. A number of principles were established to 
guide the process of sampling from the domain. They were as follows: 

♦scope 

It was decided that the test should include at least one task from each of Ellis' categories to ensure 
coverage of the target domain. All macroskills and a broad range of language functions were to be 
represented on the test. 

♦frequency of use 

It was on the basis of this principle that a decision was made to give preference to tasks mentioned 
more than once by our informants (those marked with an asterisk on the inventory).and to give 
greater weighting to spoken discourse in the design of the test. 

•importance 

The teacher input literature reviewed earlier in this paper served as a basis for prioritizing particular 
aspects of teacher talk. Test tasks and assessment criteria were selected with these in mind. 

The specifications drawn up for the test represent a trade-off between each of the above principles. 
Although all four skills are assessed in the pilot version of the test this paper will focus on the 
speaking test only. 

4. TEST DESIGN 

A brief outline of tasks included in the pilot version of the speaking test is shown in Appendix B. 

For the sake of coverage tasks have been sampled from the "medium-oriented", "message- 
oriented", "activity-oriented" and "framework-oriented" categories of communication. The entire 
test is conducted in Italian. Phase 2A, the reading aloud task, provides raters with an opportunity 
to focus on pronunciation independently of other features of language proficiency and at the same 



1 This category has been further subdivided by the reseacher. 

2 This last category is not included in Ellis' framework but has been added by the researcher to 
cover areas of target language use regarded by informants as relevant to their role 



ERLC 



5 



time provides input ,br the subsequent storytelling task. Phase 2B, the story-retelling task serves 
to elicit narrative discourse and, more importantly, it involves the candidate in reformulating and 
elaborating the linguistic input provided in the original reading task. Phase 3, the instruction- 
giving task, focuses on the kind of concrete construction activity which is common in primary 
language classrooms and taps the ability to use context-dependent language which is rich in 
directives. Phase 4, the role-play task, elicits "framework-oriented" input from the candidate and 
also provides opportunities for negotiated interaction. Phase 5, the culture-related presentation 
invites "message-oriented" input on cultural topics which are typically covered in primary and 
secondary classrooms. In Phase 6, the error correction task, candidates are required to comment 
on a piece of student writing containing a number of common errors and thereby to demonstrate 
their metalinguistic abilities. 

As is obvious from these abbreviated specifications, the examinee is required to assume the role of 
classroom language teacher from Phase 2 of the interview until the end The most obvious 
limitation to task authenticity is the predominance of monologic discourse The decision to limit 
the "interactiveness" of test tasks, in spite of the demonstrated importance of negooated interaction 
to second language acquisition, was due to the fact that the examiners (there are two of them) 
could by no stretch of the imagination be taken to be equivalent to the teachers target audience 
namely a group of primary school-age second language learners with limited understanding and 
control of the target language. The monologic tasks allow candidates to address a hypothetical 
audience thereby saving them the embarassment of "talking down" to their assessors. With the 
exception of Phase 1 and 5 of the test the interviewer's main role is to expedite procedures rather 
than to engage in conversational interaction with the candidate. 

While the design of test tasks is intended to reflect in some measure the linguistic demands of the 
classroom environment, the assessment criteria (see Appendix C) draw attention to the features of 
language proficiency which the literature identifies as important for second language development. 
Assessment criteria are of two types. First, the linguistic criteria, which are applied task by task 
assess pronunciation, grammatical accuracy, resources of expression, fluency and 
comprehension. Assessments for each of these categories are are made at least once, and in most 
cases twice during the course of the interview. Descriptions of performance at six levels of ability 
are provided for each rating category. Classroom communicative co m p etence criteria, on the other 
hand invite judgements about the 'teacherliness' of task performance or, in other words the 
quality of language production in terms of its suitability for the classroom. To assist with these 
assessments raters are provided with a list of questions to help them arrive at a decision such as 
"was the pausing, phrasing and pace of delivery appropriate?" "did the candidate tailor her 
language in such as way as to make it intelligible to a child audience?" These classroom 
communicative competence assessments occur three times during the interview and are measured 
on a defined four point scale. Also, towards the end of the interview, a metalanguage category is 
included to assess the quality of candidates' explanations of learner error. 

At the end of the interview assessors are asked to give two global or summative ratings: one for 
general language proficiency on a scale defined at four points,, and another rating for overall 
Teacher competence also on a 4 point scale. These last two criteria make it possible to gauge the 
relative influence of of the various assessment categories on overall judgements of language 
proficiency and assist in determining a minimum language proficiency threshold for effective 
classroom performance. 

5. ASSESSOR FEEDBACK . . 

After dialling the speaking test on a sample of 50 candidates and videotaping their performance 
feedback was sought from a group of 18 experienced Italian teachers/teacher trainers who attended 
an initial briefing session. Multiple copies of all 50 samples of test performance were divided 
amongst the assessors, all of whom viewed a minimum of 12 tapes and rated them according to 
the specified criteria. They then filled out a questionnaire requiring them to evaluate each test task 
as a) a measure of language proficiency and b) a measure of classroom commmumcative 
competence. Suggestions for improvement were also requested. The results of this survey are 
tabulated below. 
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Table 1. Test tasks as a measure of language proficiency 



TASK 


Appropriate 


Inappropriate 


Phase 2 A Story Reading 


53% 


47% 


Phase 2B Story Retelling 


100% 


0% 


Phase 3 Instruction Giving 


100% 


0% 


Phase 4 Rolcplay 


86% 


14% 


Phase 5 Cultural Presentation 


85% 


15% 


Phase 6 Error Correction 


92% 


8% 


Table2 . Test tasks as a 


measure of classroom 


competence 


TASK 


Appropriate 


Inappropriate 


Phase 2A Story Reading 


92% 


8% 


Phase 2B Story Retelling 


100% 


0% 


Phase 3 Instruction Giving 


100% 


0% 


Phase 4 Rolepiay 


92% 


8% 


Phase 5 Cultural Presentation 


85% 


15% 


Phase 6 Error Correction 


92% 


8% 



As can be seen from these figures an overwhelming majority of responses indicated that test tasks 
were perceived to be suitable measures of both language proficiency and classroom 
communicative ^competence. While some assessors felt that the "reading aloud" task was not a 
measure of language proficiency as such, they felt that the inclusion of this task was legitimate 
given its relevance to the foreign language teachers role. There were more reservations about the 
cultural presentation than any other phase of the test. One or two raters criticized the topics and/or 
input materials provided for this task, but the main objection was the difficulty of assessing 
language proficiency independently of the background knowledge required for successful 
performance of the task. Criticisms of the rolepiay task were mainly directed towards the 
interviewer, who was seen to be "working too hard at the expense of the candidate controlling the 
discourse flow". Interestingly, the assessors were unanimous in support of the monologic story- 
retelling and instruction-giving tasks which suggests (as we have already intimated) that it may be 
easier to sustain the illusion of the candidate as teacher in the absence of interviewer input. The 
error correction exercise was well-received by all but one of the informants. The dissenting voice 
was that of an experienced primary teacher who, while recognizing the importance of grammatical 
knowledge in L2 teaching, objected to the task on the following grounds 7 would never do this in 
the classroom. I would elicit explanations from learners rather than providing them myself. A 
similar uneasiness about the authenticity of test tasks is reflected in suggestions for improvement 
made by a number of informants who called for more props (e.g. blackboard and board marker, 
illustrative material) to enhance the 'teacherliness' of the presentation and more explicit 
instructions to candidates about the nature of the learner audience. 
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6. DISCUSSION 

These latter comments highlight a problem which is germane to LSP testing generally In the 
attempt to produce high fidelity simulations of the contextual specifics of real life performance 
there is a risk of obscuring the fundamental purpose of the test encounter, which is to produce a 
sample of language from which inferences can be drawn. The more life-like the task the greater the 
likelihood that it will be valued per se as target of assessment (Messick 1992) rather than as a 
vehicle for eliciting information which is generalizable to a range of other situations The error 
correction task is best seen as a pretext for determining whether candidates have the capacity to 
provide feedback to learners, rather than as a sample lesson in its own right. Error feedback 
strategies, after ail, will vary from learner to learner and no single performance can encompass all 
possible approaches, especially when the learners are not present at the interview. Likewise a 
very detailed specification of the age and proficiency level of classmembers to whom 
communication is directed may prevent extrapolation to other classroom situations. It is for this 
reason that Doye advocates that "we should endeavour to employ just the amount of realism that 
makes a task understandable and plausible, but no more" (1991 : 106). 

One of our informants went so far as to suggest that the very quest for authenticity was 
inappropriate. She cautioned that those who took the teacher role simulation seriously and 
attempted to produce comprehensible input for a hypothesized semi-proficient L2 audience ran the 
risk of masking their true level of language proficiency through lexical and syntactical 
simplification ot their utterances and through uncharacteristically slow rates of delivery. She 
thereby drew attention to what may indeed be a fundamental incompatibility between general 
proficiency language testing, which assumes a developmental continuum involving an incremental 
increase in range and complexity of language use as proficiency progresses, and certain kinds of 
occupation-specific testing where simplicity, clarity and sensitivity to audience may be valued over 
and above elaborateness. In a test such as the one we have developed it is conceivable that native 
speakers, understandably anxious to 'show off their level of linguistic sophistication, may be 
outperformed by less proficient speakers who are more responsive to the specific demands of the 
criterion domain. Since the test is to be used for selection there are issues of social equity to be 
considered. Is it reasonable to demand behaviour of nonnative speakers that native speakers 
cannot or do not demonstrate? Furthermore, given the difficulty of replicating the contextual 
features of the classroom in a test environment, can we regard an unconvincing role simulation on 
our test as indicative of inability to perform in the real world? 

Our practical solution to these concerns has been to separate classroom competence ratings from 
linguistic ones in reporting performance on the test. This will ensure that ability estimates for 
candidates demonstrating high levels of language proficiency will not be unduly influenced bv 
failure to act out the teacher's role effectively in the test situation. But this amounts to a weakening 
of the test's claim to specificity. If information about general language proficiency is enough, why 
bother with measurement of classroom-specific competence? 

If we are to take seriously the test's LSP orientation, its claim to measure something other than 
general language proficiency requires empirical validation. There are a number of possible 
approaches to the validation process. First we can investigate the degree of 'fit' between ratings 
assigned for linguistic categories as opposed to those assigned for classroom communicative 
competence to determine whether the latter make an independent contribution to overall estimates 
of ability 3 . If they do this would suggest that classroom considerations are worth including on the 
test. Alternatively, we could look for evidence of 'teacher talk' as a distinguishing feature in test 
language behaviour, by analysing samples of discourse at a range of proficiency levels.Positive 
findings would indicate that, in spite of constraints on authenticity, the design of our test yields 
information which is of direct relevance to the teaching situation. In the absence of such evidence 
the only justification for the test's specificity is the perceived relevance of test tasks to the domain. 
While the perceptions of language experts must be taken into account, they are not in themselves a 
sufficient basis for the interpretation and use of test scores. 



3 The application of Item Response Theory models allows for ;his kind of analysis 
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